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Abstract 

We propose new estimates for the frontier of a set of points. They are defined as kernel 
estimates covering all the points and whose associated support is of smallest surface. The 
estimates are written as linear combinations of kernel functions applied to the points of 
the sample. The coefficients of the linear combination are then computed by solving a 
linear programming problem. In the general case, the solution of the optimization prob- 
lem is sparse, that is, only a few coefficients are non zero. The corresponding points play 
the role of support vectors in the statistical learning theory. The Li error between the 
estimated and the true frontiers is shown to be almost surely converging to zero, and the 
rate of convergence is provided. The behaviour of the estimates on finite sample situations 
is illustrated on some simulations. 
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1 Introduction 

Many proposals are given in the literature for estimating a set S given a finite random 
set of points drawn from the interior. This problem of edge or support estimation arises 
in classification (Hardy & Rasson [21]), clustering problems (Hartigan |25]), discrim- 
inant analysis (Baufays & Rasson [3]), and outliers detection. Applications are found 
in medical diagnosis (Tarassenko et al [22j) as well as in condition monitoring of ma- 
chines (Devroye & Wise pjj]). In image analysis, the segmentation problem can be 
considered under the support estimation point of view, where the support is a convex 



bounded set in (Korostelev & Tsybakov [30j). We also point out some applica- 
tions in econometrics (e.g. Deprins, et al [lO])- In such cases, the unknown support can 
be written 

S^{ix,y): 0< x< 1 ; 0< y < /(x)}, (1) 

where / is an unknown function. Here, the problem reduces to estimating /, called the 
production frontier (see for instance Hardle et al [21]). The data consist of pair {X,Y) 
where X represents the input (labor, energy or capital) used to produce an output Y in 
a given firm. In such a framework, the value f{x) can be interpreted as the maximum 
level of output which is attainable for the level of input x. Korostelev et al [2S] 
suppose / to be increasing and concave, from economical considerations, which suggests 
an adapted estimator, called the DEA (Data Envelopment Analysis) estimator. It is the 
lowest concave monotone increasing function covering all the sample points. Therefore it is 
piecewise linear and, up to our knowledge, it is the first frontier estimate computed thanks 
to a hnear programming technique (Charnes et al [7]). Its asymptotic distribution is 
established by GlJBELS et al 

An early paper was written by Geffroy [T3] for independent identically distributed ob- 
servations from a density 0. The proposed estimator is a kind of histogram based on the 
extreme values of the sample. This work was extended in two main directions. 

On the one hand, piecewise polynomials estimates were introduced. They are defined 
locally on a given slice as the lowest polynomial of fixed degree covering all the points in 
the considered slice. Their optimality in an asymptotic minimax sense is proved under 
weak assumptions on the rate of decrease a of the density towards by Korostelev 
& Tsybakov [30J and by Hardle et al [22] . Extreme values methods are then proposed 
by Hall et al [20] and by Gijbels & Peng [Hj to estimate the parameter a. 

On the other hand, different propositions for smoothing Geffroy's estimate were made in 
the case of a Poisson point process. GiRARD & Jacob \l8l introduced estimates based on 
kernel regressions and orthogonal series method [161 [T7|. In the same spirit. Gardes [i2\ 
proposed a Faber-Shauder estimate. GiRARD & Menneteau [T2| introduced a general 
framework for studying estimates of this type and generalized them to supports writting 

S = {ix,y): xeE ■ Q< y < f{x)}, 

where / is an unknown function and E an arbitrary set. In each case, the limit distribution 
of the estimator is established. 

We also refer to Abbar [Ij and Jacob & Suquet ^ who used a similar smoothing 
approach, although their estimates are not based on the extreme values of the Poisson 
process. 

The estimate proposed in this paper can be considered to belong to the intersect of 
these two directions. It is defined as a kernel estimate obtained by smoothing some 
selected points of the sample. These points are chosen automatically by solving a linear 
programming problem to obtain an estimate of the support covering all the points and 
with smallest surface. Its advantages are the following: it can be computed with standard 
optimization algorithms (see e.g. Bonnans et al [5J, chapter 4), its smoothness is directly 
linked to the smoothness of the chosen kernel and it benefits from interesting theoretical 
properties. Here, we prove that it is almost surely convergent for the Li norm. The 
estimate is defined in Section [2l Its theoretical properties are established in Section [31 



The behaviour of the estimate is illustrated in Section |4] on finite sample situations. Its 
compared to a similar proposition found in Barron et al Proofs are postponed to 
Section [51 



2 Boundary estimates 

2.1 A linear programming problem 

Let all the random variables be defined on a probability space {Q,J^,P). The problem 
under consideration is to estimate an unknown positive function / : [0, 1] — t- (0, oo) on 
the basis of observations Zjy = (Xj, jv- The former represents an i.i.d. sequence 

with pairs {Xi,Yi) being uniformly distributed in the set S defined as in ([1]). For the 
sake of simplicity, we consider in the following the extension of / on all M by introducing 
f{x) = for all X ^ [0, 1]. Letting 



C/ = / /(^) du= f{u) du, 
Jo Jm. 

each variable is distributed in [0,1] with p.d.f. f{-)/Cf while 1^ has the uniform 
conditional distribution with respect to Xi in the interval [0, f{Xi)]. 

The considered estimate of the frontier is chosen from the family of functions: 

N 

fN{x) = ^ Kh{x - X,)ai , Kh{t) = h-'K{t/h), 

ai>0, i = l,...,N, 

where i^' is a kernel function i^' : M — t- [0, oo) integrating to one and with bandwidth h > 0. 
Each coefficient at represents the importance of the point {Xi,Yi) in the estimation. In 
particular, if a, 7^ 0, the corresponding point {Xi, Yi) can be called a support vector 
by analogy with Support Vector Machines (SVM). We refer to Cristianini & Shawe- 
Taylor [9] for a review on this topic and to Scholkopf & Smola [31], chapter 8, for 
examples of application of SVM to quantile estimation. The constraint ai > for all 
i = 1, . . . , N ensures that /iv(x) > for all a; G M and prevents the estimator from being 
too irregular (see equation (I5U|) ). Let us remark that the surface of the estimated support 
is given by 

N 



This suggests to define the vector parameter a = (ai, . . . , ttA?)^ from a linear program as 
follows 

Jp = min l^a (3) 

a 

subject to 

Aa>Y (4) 
a > 0. (5) 

The following notations have been introduced: 

1 ^ (l,l,...,ireM^ 



A ^ \\Kh{Xi-X^ 

Y 4 (yi,...,yiv) 



Hence, Aa = {/^{Xi), . . . , /^{Xn))'^ , and the vector constraint @ means that 



fN{X,)>Y,, z = l,...,N. (6) 

In other words, /at defines the kernel estimate of the support covering all the points 
and with smallest surface. In practice (see Section H] for an illustration) the solution of 
the linear program is sparse in the sense that n{a) = Cardjaj 7^ 0} is small (for mod- 
erate values of h) and thus the resulting estimate is fast to compute even for large samples. 



Let us note that the above described estimator ([2])-([5]) might be derived as the Maximum 
Likelihood Estimate related to the approximation family ([2]). Indeed, the joint probability 
density function for observations given parameter function f{x) can be written 



(7) 



where 1{.} is the indicator function. Moreover, 



/=/a 



N 

E 

1=1 



and therefore, the Log-Likelihood function is 



N 



N 



L{a)^logp{ZM\fN) = -iVlog^a.-l- J]logl{r, < /^(X,)}, 



(9) 



i=l 



and its maximization over the set of non-negative parameters a is equivalent to problem 

m-m. 



2.2 Comparison with other methods 

Let us remark that other solutions for estimating a in (|2]) have already been proposed. 
GiRARD & Menneteau [19] considered a partition {/^ : 1 < r < A;} of [0,1], with 
— 7- 00. For all 1 < r < fc, they introduce Dr = {{x,y) : x E Ir, < y < f{x)}, the slice 
of S built on Ir, Y* = max{yj; (Xj, Yi) G Dr}, and the estimates 



\{Ir)Y; if3rG{l,...,fc};F, = y; 
* otherwise, 

where A is the Lebesgue measure. They propose the following frontier estimate 

k 

hix) = J2Kh{x - Xr)\{Ir)Y;, 
r=l 

where Xr is the center of Ir- This approach suffers from a practical difficulty: the choice 
of the partition and more precisely the choice of k. In our context, solving the linear 
problem ([3])-([5]) direcly yields the support vectors. 



In this sense, the estimate proposed in Barron et al [2J is similar to /at. It is defined by 
the Fourier expansion: 



M M 



dwix) = Co + flfc COS {2TTkx) + bk sin (27r/cx), (10) 

k=l k=l 

where the vector of parameters /3 = (cq, ai, . . . , gm, W, . . . , 6m)^ is solution of the linear 
programming problem: 

minco (^^ j 9N{x)dx^ (11) 

under the constraints 

9NiX,) > r„ z = l,...,iV (12) 

AI 

$^A;(|afc| + |6fc|) < L/{2n). (13) 



k=l 



Therefore, defines the Fourier estimate of the support covering all the points (equation 
(fT2l) ). L-Lipschitzian (equation (fT3ll ) and with smallest surface (equation (fTTl)). From the 
theoretical point of view, this estimate benefits from minimax optimality. It is compared 
to Jn on practical situations in Section H] for different choices of parameters M, L and h. 



3 Main results 

The basic assumptions on the unknown boundary function are: 
Al. < Uin < fix) < /max < oo, for all X G [0, 1], 
A2. |/(a;) - f{y)\ < Lf\x - y\, for all x,y e [0, 1], Lf < oo. 

The following assumptions on the kernel function are introduced: 
Bl. K{t) = K{-t) > 0, 

B2. / K{t) dt = 1, 

B3. \K{s) - K{t)\ < Lk\s - t|, Lk < oo, 

B4. Ca{K) = [ K\t)dt < oo and CiiK) = [ t^K{t)dt < oo. 

JR Jr 

We denote -ft'max — maxii'(t). In the following theorem the consistency of the estimate is 
established with respect to the Li norm on the [0, 1] interval. 

Theorem 1 Let /i — > and \ogN/{Nh^) as N ^ oo. Let the above mentioned 
assumptions A and B hold true. Then estimator has the following asymptotic 

properties: 

limsup E^\N)\\fj^- fill <Ciuj)<oo a.s. (14) 

N-yoo 

with 

ei{N) ^ max ^/\og N/{Nh^)^ . (15) 



Corollary 1 The maximum rate of convergence which is guaranteed by Theorem U\ 

\\fN-f\\i=o[i\ogN/Nf'] 
is attained for 

h ^ (log N/nY^^. (16) 

This rate of convergence can be ameliorated at the price of a slight modification of the 
estimate. In the following, an additional constraint is considered in order to impose to 
each coefficient to be of order 1/A^. The counterpart of this modification is that the 
new estimate f^ will usually rely on more support vectors than f^. 
Let us modify the estimator ([2])-([5]) as follows. 



N 



fN{x) = Y,Kh{x-X, 



(17) 



i=l 



where vector a 



"1, 



a^) is defined from the Modified Linear Program 



7* A • 1 T 

Jj^jp = mm 1 a 



subject to 



Aa>Y 
0<a< Ca/N 



with a constant 



Ca > fn 



(19) 
(20) 

(21) 



Remark. In fact, we need to ensure > Cf which is implied by fl2ip . 

The modified estimator (IT7|) - (PT]) differs from that of ©-([S]) by additionally bounding 
each from above, see constraints (1201) . Below we prove that under condition (I^Tj) as 
well as finite support kernel K{-) the Modified Linear Program f|T8l) - fl20l) has a nonempty 
set of admissible solutions with the same upper bound as fl25|) and a better lower bound 
than (HOD. 



Theorem 2 Let /i — )■ and log N/{Nh) — )■ as N ^ oo. Let kernel function K{-) has a 
finite support, that is K{t) = V|t| > 1, and the assumptions A and B hold true. Then 
estimator (Tl\)-^2l\) has the following asymptotic properties: 

limsup £^^(A^)||/Ar - /111 < C(a;) < oo a.s. (22) 

with 

82 {N) = max I h, v/logiV/(iV/i)| . (23) 

Remark. The support of K{-) is fixed to be the interval [—1, 1] without loss of generality. 
Corollary 2 The maximum rate of convergence which is guaranteed by Theorem 

||/iv-/||i = o((logiV/iV)^/^' 
is attained for 

h- (log N/Nf^\ (24) 



4 Numerical experiments 



The simulations presented here illustrate the behaviour of the kernel estimator /^r com- 
pared to the estimator based on Fourier expansions proposed in Barron et al [2]. 
Since the Fourier estimator requires the unknown function to be periodic, we choose / 
such that /(O) = /(I). Besides, to avoid boundary effects on the input domain, we con- 
sider functions that are nearly zero when x is close to or 1. In more general situation, 
boundary corrections should be implemented (see Cowling & Hall [8]). The chosen 
function 



is piecewise linear and locally Lipschitizian with a Lipschitz constant Lf = 8. For each 
estimate, the Li error A^v as well as the number of effective parameters np (that is 
and rijs = Card{/9j 7^ 0}) are evaluated for N = 25 and = 100. The average value 
and the standard deviation of these quantities are computed on 1000 rephcations in the 
first case and on 100 replications in the second one. The estimation is carried out with 
different values of the parameters, namely h for the kernel estimate, and L and M for 
the Fourier estimate. The adaptive choice of these parameters is not implemented in this 
setting. The results are summarized in Tables [1] and O The lowest error is emphasized for 
each estimate. It can be noted that the mean Li error of both estimates are very similar. 
In fact, the kernel estimate seems to give a slight lower error for small number of points 
and the Fourier estimate yields better results for large sample size situations, confirming 
its asymptotic optimality. Let us note that the standard deviation of the Li error is in 
general smaller for the kernel estimate. Regarding the number of parameters, the kernel 
estimate seems to be more parsimonious than the Fourier estimate. 

5 Proofs 

The proof of Theorem [T] which is given in subsection 15.31 is based on both upper and lower 
bounds derived below. 

5.1 Upper bound for /^v 

Lemma 1 Let /i — )■ and log N/{Nh) as N ^ 00. Let the above mentioned 
assumptions A and B hold true. Then for almost all u E Q there exist finite number 
Nq{u) such that 



fix) = 0.1 + 



5(x 
5{x 
l{x 
9{x 
8{x 



0.1)l{a;>0.1} 
0.2)l{a;>0.2} 
0.5)l{a;>0.5} 
0.8)l{a;>o.8} 
0.9)l{a;>o.9}, 



+ 



+ 





(25) 



with non random both 0{h) and O ( ^J\ogN/ (Nh) 




Proof of Lemma [T]. 1. Since kernel function K{.) is supposed to be even then matrix 



estimate 


h 


L 


M 


mean(A7v) 


st-dev(Ajv) 


mean(nj9) 


st-dcv(np) 


kernel 


0.100 






0.123 


0.038 


5.263 


0.970 




0.120 






0.116 


0.034 


4.490 


0.841 




0.140 






0.112 


0.033 


3.841 


0.683 




0.160 






0.115 


0.031 


3.420 


0.636 




0.180 






0.123 


0.027 


3.120 


0.657 




0.200 






0.132 


0.023 


2.863 


0.645 


Fourier 




3.000 


4.000 


0.144 


0.035 


4.567 


0.777 






5.000 


4.000 


0.119 


0.043 


5.508 


0.986 






7.000 


4.000 


0.127 


0.043 


6.572 


1.217 






9.000 


4.000 


0.138 


0.044 


7.235 


1.284 






11.000 


4.000 


0.147 


0.046 


7.592 


1.249 






13.000 


4.000 


0.154 


0.046 


7.815 


1.210 


Fourier 




3.000 


8.000 


0.144 


0.036 


4.581 


0.800 






5.000 


8.000 


0.121 


0.044 


5.571 


1.057 






7.000 


8.000 


0.129 


0.044 


6.730 


1.379 






9.000 


8.000 


0.142 


0.046 


7.632 


1.669 






11.000 


8.000 


0.153 


0.047 


8.314 


1.873 






13.000 


8.000 


0.163 


0.048 


8.859 


2.050 



Table 1: Results for 1000 simulations with N — 25 points. 



estimate 


h 


L 


M 


mean(Ajv) 


st-dcv(AAr) 


mean(np) 


st-dcv(np) 


kernel 


0.050 






0.073 


0.016 


13.700 


1.560 




0.070 






0.060 


0.014 


9.890 


1.246 




0.090 






0.060 


0.013 


7.350 


1.132 




0.110 






0.063 


0.012 


5.820 


0.989 




0.130 






0.075 


0.012 


4.690 


0.734 




0.150 






0.085 


0.013 


3.960 


0.549 


Fourier 




3.000 


4.000 


0.129 


0.021 


5.120 


0.700 






5.000 


4.000 


0.078 


0.020 


5.790 


0.756 






7.000 


4.000 


0.061 


0.012 


7.630 


0.960 






9.000 


4.000 


0.064 


0.013 


8.700 


0.560 






11.000 


4.000 


0.069 


0.015 


8.880 


0.409 






13.000 


4.000 


0.071 


0.016 


8.950 


0.297 


Fourier 




3.000 


8.000 


0.129 


0.021 


5.160 


0.762 






5.000 


8.000 


0.078 


0.020 


5.920 


0.849 






7.000 


8.000 


0.059 


0.013 


8.070 


1.350 






9.000 


8.000 


0.059 


0.015 


10.470 


1.630 






11.000 


8.000 


0.063 


0.015 


12.090 


1.682 






13.000 


8.000 


0.069 


0.015 


13.620 


2.068 



Table 2: Results for 100 simulations with N — 100 points. 



A is symmetric, and the dual problem associated to ([3]) - ([5]) can be written: 

J*^ = maxY^X (26) 

A 

subject to 

AX<1 (27) 
A > 0. (28) 

Let us replace vector Y in (126|) for 

F^if{X,),...J{X^)f (29) 

and, moreover, change the vector constraint ( |27|) by a scalar one which is directly obtained 
by just summing all N rows of (127|) . Thus, we arrive at the modified dual problem 

= maxF^A (30) 

subject to 

l^AX < N (31) 

A > 0. (32) 

Since F > Y and according to the well known Duality Theorem (see e.g. Hiriart- 
Urruty & Lemarechal [26], chapter 7): 

Jp = Jd — Jmd- (33) 
Now we derive an upper bound on J|,j^ . 

2. Let us arbitrarily fix a vector A which meet the constraints f l3T]) . f l32|) and then write 
inequality f l3T|) in the equivalent form as follows: 



N 

or, equivalently. 



TV / N \ 

-Y^xAKn{Q) + Y,KH{X.,-X,) \ <1, 



(34) 



AT / AT AT \ 

7 = 1 \ JT^.} JT^i / 



(35) 



with 

i,, ^ KhiX, - Xj) - E {KhiX, - Xj) I X,} . 

Now apply upper bound fl96l) . proved in Lemma [5] (see Appendix), to the relation ( |35|) 
taking into account that K{0) > and 

E{K,iX,-X,)\X,} = ll'^(y^)^du (36) 

= / K(t)f{X, + ht)dt 
Jr 

= ^{f{X,) + 0{h)), (37) 



with non random 0{h). Hence, 



N-1 
CfN 



N 



Y^xAf{X^) + 0{h)-C 



logA^ 
Nh 



< 1, ^N>N2{uj) 



{31 



with non random constant C. First, inequahty fl38|) imphes 



N 



2C 



Jy ' 



(39) 



with almost surely finite Ns{uj) > N2{uj). Second, ( l39l) and ( l38l) imply upper bound ([25 
and Lemma [1] is proved. i 



5.2 Lower bound for / 



N 



Lemma 2 Under the assumptions of Theorem Ui for almost all u ^ Q there exist finite 
number Ni{uj) such that for each x G (0, 1) 



fNix) > fix) - O (VlogiV/(iV/i2) 
where O(-) do not depend on x. 



WN> Ni{u), 



(40) 



Proof of Lemma [2]. 1. Suppose that for some non-random 6^ > there exists (with 
probability one) an integer ik & {1, . . . , N} such that 



Then, the estimation error at a point x G (0, 1) can be expanded as 

f{x)-h{x) = [/(x)-/(X,J] 

+ [fM{X,,)-fN{x) 

The term in the right hand side fB2l) may be bounded as follows 

\f{x)-f{X,,)\<Lj\x-X,,\<Lf5., 

as well as the term 



fNiXiJ - fNix) 



< Lt \x- X, \< Lt 5. 



/a 



(41) 

(42) 
(43) 

(44) 
(45) 
(46) 



with a Lipschitz constant Lj^ for the function estimate /at (a;), which is bounded below. 
In order to bound fH3l) assume that for some non-random 6y > 0, 

(47) 



> /(X,J - 5, a.s. 
Remind that /iv(XiJ > F^, due to (il) or (E]). Thus, 

fix,,) - hiX,,) < (F,, + 5y) - = 6y. 



(4J 



Combining all these bounds we obtain from fj42|) that for all > No{u), 

fix) - fNix) <Sy+(^Lj + Lj^^ 5,. (49) 

2. Note that a straightforward evaluation of the Lipschitz constant for the estimate 
function yields: 

N 

\fN{u)-fN{v)\ < J2'^^\^h{u-X,)-Kj,{v-X,)\ (50) 



i=l 

' N 



i=l 

Hence, due to the upper bound fl25|) . we obtain almost surely 

% = ^^/(l + ViV>iVo(a;), (52) 

with almost surely finite Nq{lj). 

3. Now, we demonstrate that under appropriate definition of 6^ and 6y as functions of h 
and there exist almost surely finite random integer A'"o(ci;) such that 

yN>No{uj), 3^fcG{l,...,iV : (X,,, F.J G A(a;)}, (53) 

with 

A(x) ^ {{u, v) : \x-u\< 5,, fix) -6y<v< f{u)}. (54) 

Indeed, introduce 

^ /Kiogivy/' 



and 

5, = h^5y. (56) 

Then, 

P{(X„r,) ^ A(x) Vz = l,...,iV} = M ^SJy 

1 + 0(1) ,.,2.2 



< exp<; 



< iV-"/(2C/)_ (57) 
K > 2Cf (58) 



Hence, fixing 
implies the convergence of the series 

oo 

^P{(X„r,)^ A(x) Vz = l,...,iV}<oo, (59) 



N=l 



which, due to Borel-Cantelly lemma, imphes the existence of almost surely finite No{u) 
such that relation fl53l) holds true. 

4. Therefore, substituing relations (l52l) . fl55|l . and fl56l) to (H9|) leads to lower bound 

fN{x) > fix)-5y-0{h-^)S, 

with non-random term O(-) independent of x. ■ 



5.3 Proof of Theorem [T] 

1. Since |m| = m — 2ul{u < 0}, the Li-norm of estimation error can be expanded as 



WfN-fl 



fN{x) - fix) 



dx 



+2 



f{x) - /jv(x) l|/7v(a;) < /(x)} c/x. 



(61) 
(62) 



2. Applying Lemma [T] to the right hand side ( 1611) yields 



limsup e^/Bl^) ( / fN{.x) - f{x) 



dx I < const < oo a.s. 



with 



euB{N) ^ max ^/\og N/{Nh)'j . 
3. In order to obtain a similar result for the term fl62|) . note that Lemma [2] implies 



(63) 

(64) 



Cn{x) = s-^UN) [fix) - fN{x)\ < C{uj) < oo a.s. 

uniformly with respect to both x and A^, with 

SLBiN) ^ ^/\ogN/{Nh^). (65) 

Hence, one may apply Fatou lemma, taking into account that ul{u > 0} is a continuous, 
monotone function: 

(66) 

(67) 
(68) 



limsup eLsi^) / fi^) - hi^) l\ fwix) < f{x) \ dx 

N^oo Jo -' 

< / limsup CNix) l{(Nix) > 0} dx 

Jo iV-j-oo 

< C{uj) < oo a.s. 



4. Thus, the obtained relations together with ( 16T|) and (162|) imply ([H]), (fTSjl and Theorem 
[T]is proved. ■ 



The proof of Theorem [2] which is given in subsection 15.61 is based on the similar ideas as 
that of Theorem [H see below. 



5.4 Upper bound for /^y 

Since the admissible set f|T9l) . f l20|) is narrower being compared to that of (jl]), ([5]), it is 
important to demonstrate that the upper bound remains at least the same. 

Lemma 3 Let the assumptions of Theorem\^ hold true. Then for almost all u E Q there 
exist finite number Nq{u) such that 



JMP<Cf + 0{h) + 0[\l^-^ ), ViV>iVoM, (69) 



with non random both 0{h) and O i ^^/log N/{Nh) ). 



Proof of Lemma [3l 1. Since kernel function K{t) is supposed to be even then matrix 
A is symmetric, and the related to (fT8|) -(l20l) dual problem looks like 

Jmo = max{Y^X-CaN-H'^u) (70) 

A, v 

subject to 

AX-iy<l (71) 
A > (72) 
u>0. (73) 

Let us replace vector Y in ( 170|) for 

F4(/(X0,...,/(X^))^ (74) 

and, moreover, change the vector constraint ( ITT]) by a scalar one which is directly obtained 
by just summing all rows of ( 1711) . Thus we arrive at the modified dual problem 

J* A 
'^MMD — 

subject to 



max (F^X — 




(75) 


l^AX - l^u 


< N 


(76) 


A > 




(77) 


1/ > 0. 




(78) 



Since F >Y and according to the well known Duality Theorem 

•Jmp = -Jmd — Jmmd- C^9) 
Now, we derive an upper bound on Jmmd ■ 

2. Let us arbitrarily fix (A, u) which meet the constraints ([76l)-(l78l) and then write in- 
equality (!76|l in the equivalent form as follows: 



1 ^ / ^ \ 1 

- ^ A J K,{0) + MX. - X,) < 1 + -1^ 



U, (80) 



or, equivalently, 



N / N ^ \ 1 

j=l \ i^j i^j J 

with 

= ir;.(X, - X,) - E {Kj,{X, - X,) I X,} . (82) 

Now apply upper bound ( l96l) . proved in Lemma O to the relation (ISTl) taking into account 
that i\:(0) > as well as ([36D-(l37D- Hence, 



N 



^ E A, (^/(X,) + 0(/.) - C^i^ j < 1 + ll^z., V X > X^M, (83) 
with non random constant C. First, from inequality flS^ it follows that 

N 



A, < -g- (^2 + il^z.) , V X > X3(c.) 



M) 



with almost surely finite N^iu) > N2{u}). Consequently, as it follows from ( l83ll . for almost 
all a; G and sufficiently large X 



-(C„-C/(l + o(l)))l^i/, (86) 



with non random O log N/{Nh) ). Thus, ( 1791) and (l85i) prove the upper bound ([69 
since (J20il implies Ca > Cf. 



5.5 Lower bound for / 



N 



Lemma 4 Under the assumptions of Theorem 0, for almost all u E Q there exist finite 
number Ni{u) such that for each x G (0, 1) 



fN{x)>f{x)-0(^J\ogN/{Nh) ), VX>Xi(a;), (87) 
where O(-) (io not depend on x. 



Proof of Lemma [4] is given in the same manner as that of Lemma [21 The only essential 
difference is in better Lipschitz constant for fN^x). Indeed, for any u,v E (0, 1) 



TV 



fNiu)-fN{v) < ^a,|X^(M-X,)-X^(t;-X, 



i=l 



with 

I{-)^{i\Kn{--X,)^Q}. 
From the Strong Law of Large Numbers, 



Card/(-) 



/(■ 



and thus, 



Lk Ca 2/i 



Nh{l + o{l)) a.s. 



(90) 



(91) 



(92) 



by the upper bound fl20l) on a. 



5.6 Proof of Theorem [2] 

Theorem [2] is proved in the same manner as that of Theorem [H basing on lemmas [1] and 
[21 Note, that the lower bound from Lemma [2] is now not worse being compared to the 
upper bound, which is the result of the estimator modification. 

Note: The result fl22|) - (l23|) of Theorem [2] may also be proved for different iable kernel 
functions with infinite support which meet the condition 



\K'{t)\<iiK{t), Vte 
with some constant ii. Indeed, (l93l) implies 

^1 [x-X, 



(93) 



1 ^ / 

i=i ^ 



h 



(94) 



Consequently, when estimate function /Ar(x) is bounded from above, its Lipschitz constant 
is of order O {h'^) that is the same as in (!92|) . 

6 Appendix 

Lemma 5 Let the assumptions A and B hold true and constant C he sufficiently large. 
Define the random variables 



^ Kr,{X, - X^) - E {ir,(X, - X,) I , i^j. 
Then, for almost all u E Q there exist finite integer N2{u)) such that 



(95) 



max 

j=l,...,N 



N 



1 ^ 



< C^/\ogN/{Nh) yN>N2iuj). 



(96) 



Proof of LemmaO Note that for each j = 1, . . . , N the unbiased i.i.d. random variables 
{^ij) have the following properties: 



I ^ij I — -^max 5 



(97) 



and 



f Jo 



< 



< 



^— [ K\t)f{X, + ht)dt 

J max 1 • 



Co{K) 



(98) 



Thus, one may apply the Bernstein inequahty (see, e.g., BiRGE & Massart [1] or 
BoSQ [6], Theorem 2.6) which leads to 



P 



N 



1 ^ 



Xj } < 2exp 



2{af + afi/3) J 



Let us put 



fx 



K \ogN 
Nh '■ 



(99) 



with sufficiently large k which is defined below. Hence, for all N > Ni, Ni being suffi- 
ciently large non random integer. 



with 



Therefore, 



> 



KlogN 
Nh 



(100) 



P < max 

j=l,N 



< 



N r N 



> 



K \ogN 
Nh 



> 



KlogN 
Nh 



X, 



X, 



< 2N^-'^\ 
Consequently, any fixed parameter 



K > 



^ f /"max 



(101) 

(102) 
(103) 

(104) 



ensures ki > 2 which implies the convergence of series ^ N^-^^ and, due to Borel- 
Cantelli lemma, the desired result (jHS])- ■ 
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